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ABSTRACT 

The estimation of the parameters of a linear statistical 
model is generally accomplished by the method of least 
squares. However, when the method of least squares is 
applied to nonorthogonal problems the resulting estimates 
may be significantly different from the true parameters. 

The method of ridge regression may provide better estimates 
in these cases; however, a probability distribution of the 
ridge estimator is presently not known. The form of such a 
distribution is dependent upon how the ridge parameter, k, 
is selected. Two possible objective methods of choosing k 
are examined to determine if either one leads to a useful 
probability distribution. 
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I . BACKGROUND 



The following conventions will be used throughout. 

Unless otherwise noted, capital letters and Greek letters 
will refer to matrices and vectors while lower case letters 
will refer to scalars. 

A. INTRODUCTION 

The use of linear statistical models is widespread in 
scientific fields of all kinds. Generally, the linear 
statistical model is postulated as 

Y = X6 + e (1) 

where Y is an n x 1 vector of n observed values of a 
dependent variable, X is an n x p matrix containing n 
values for each of p predictor (independent) variables, 

3 is a p x 1 vector of p unknown parameters (or coefficients) 
to be estimated from data, and e is an n x 1 vector repre- 
senting experimental errors. Usually, the experimental 
error is assumed to have a multivariate normal distribution 

with mean equal to zero and variance covariance matrix 
2 2 

equal to a I where a is the scalar value of the common 
variance of the experimental errors. This assumption 
will be made throughout this paper. 

In practice, the modeling problem is to estimate the 
parameters 3 from data Y and X. The most common method of 
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doing this is called least squares estimation or some- 
times ordinary least squares (OLS) . The latter designation 
will be used in this paper. 

Under certain fairly general and common conditions 
OLS is an adequate method of estimating 3. However, when 
the data is "ill-conditioned" or nonorthogonal OLS may 
yield poor estimates of the true parameters. 

Ridge regression (RR) has been proposed [Ref. 1] as an 
alternative estimation method that might yield better esti- 
mates under conditions where OLS does poorly. 



B. ORDINARY LEAST SQUARES 

For convenience, it is assumed that the elements of X 
are scaled such that X'X has the form of a correlation 
matrix. This is done by forming from each element x^. a 
new element x'.. such that 



x ' . . = (x.. - x-)/ s 

ij ^13 J J ' x. 



( 2 ) 



- 

where x^ is the mean value of the elements of the j — 
independent variable and s^ is its standard deviation 
times an appropriate constant such that the diagonal 
elements of X'X are equal to one. The OLS estimator of 
8 is then 



3 = (X'X) -1 X'Y 



( 3 ) 
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”1 A 
so long as (X'X) exists. 1 The estimator $ is unique, 

unbiased and is the best linear unbiased estimator (BLUE) 
of 3 (it has the minimum variance among all linear un- 
biased estimators of 3) so long as E(Y) = X3 and 

2 2 

E(Y -X3)(Y -X3)' = a I where a is a scalar, as assumed 
previously. 

A 

The OLS estimator 3 is commonly used and is particularly 
useful when it can be assumed that Y is a multivariate 
normal vector with mean vector X3 and covariance matrix 
a I. In this case, it can be shown 2 that the maximum 
likelihood estimator of 3 is the same as the OLS estimator 

A 

and furthermore, since 3 is a linear function of the elements 

/N 

of Y, 3 has a multivariate normal distribution with mean 

2 - 1 

vector equal to 3 and covariance matrix a (X'X) . This 

/N 

latter characteristic of 3 allows the use of hypothesis 
tests and the computation of confidence bounds. 

Unfortunately, in some cases X'X is "ill-conditioned" 
and OLS yields poor estimates. This typically occurs when 
an experiment is poorly designed or there are economic or 
physical restraints causing strong correlations among the 
predictor variables. In this case X’X, in its correlation 
matrix form, will not be orthogonal. 



^or a derivation and details of properties of the OLS 
estimator, see, for example, Ref. 2. 

2 For example, see Ref. 2, page 182. 
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Hoerl and Kennard [Ref. 3] address the eigenvalues of 
X ' X (denoted by L, j = 1, 2, . . . . p) and point out that 
nonorthogonal data are characterized by the smallest eigen- 
value (X min ) being much less than unity and that, since 
2 

a /X ■ is a lower bound for the mean squared distance 
' mm n 

A 

between g and g, then for X'X nonorthogonal, the difference 

A 

between g and g has a high probability of being large. 

A 

When X’X is nonorthogonal g is characterized by one or more 
of the following difficulties, for example: 

(1) large variance, 

(2) large magnitude of residual errors, 

(3) incorrect signs of parameter 

estimates . 

C. RIDGE REGRESSION 

A. E. Hoerl suggested [Refs. 1 and 4] that the large 

A 

variance of g for nonorthogonal data could be reduced by 
the addition of a constant k > 0 to the diagonal elements of 
X'X, thus yielding 

g* = (X’X + kl)' 1 X'Y (4) 

as as estimator. Equation (4) is derived in Appendix A. 

^ * 

Note that for k equal to zero the estimator g is equal 

A 

to the OLS estimator g. Therefore, OLS can be thought of 
as a special case of ridge regression. 3 Hoerl suggested 

3 See Appendix B for a discussion of an even more 
general estimator. 
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the name "ridge regression" for this procedure because of 
its mathematical similarity to some of his earlier work 
[Ref. 5] on quadratic response functions. Appendix A 
contains a derivation of the ridge regression estimator. 

1 . Mean Squared Error 

The rationale behind using the ridge estimator is 
to minimize the mean squared error (MSE) associated with 
the estimate instead of minimizing the sum of squares of 
residuals as is done in OLS. 4 Hoerl and Kennard show 
that the mean squared error is given by 

MSE = Variance + (Bias) 2 (5) 

Furthermore, they show that variance is a mono tonically 
decreasing function of k, that the squared bias is a 
monotonically increasing function of k and that the rate 
of change of variance, for nonorthogonal data and small k, 
is considerably larger than the rate of change of the 
squared bias. Figure 1 is a graphical illustration of 
these relationships. Hoerl and Kennard argue that it is 
possible to find some k ^ 0 such that the variance is 
greatly reduced while only a small amount of bias is intro- 
duced, thus yielding a smaller MSE than if OLS (k = 0) 

4 In the case of unbiased estimation, which OLS is, 
these are equivalent criteria. 
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were used. Indeed they show that if B'B is bounded, then 
such a k always exists.- Thus, proper use of ridge regres- 
sion on nonorthogonal data insures a reduced MSE of 
estimation. 

The problem remains to select an appropriate 

value of k. Hoerl and Kennard [Ref. 6] suggest the use of 

two graphical devices as aids to determining an appropriate 

value of k. The first is the ridge trace, a two-dimensional 

A ± 

plot of the elements of B as functions of k and the second 
is an estimate of the squared length of the coefficient 

I A A 

vector B B . The ridge trace is used to gain an under- 
standing of the underlying correlations between the various 
predictor variables while the plot of B B is used to 
subjectively determine a suitable range of values of k. 

A typical ridge trace is illustrated in Figure 2 and a 

typical plot of B B is depicted in Figure 3. Notice 
* 1 /\ * 

that B B , in Figure 3, decreases steeply for small k 
(k < 0.2) but in the range about 0.3 to 0.4 has become 
much less sensitive to further increases in k. 

2 . Alternative Methods of Choosing k 

The previously described method of subjectively 
choosing a suitable value of k is the current method in 
use and appears to be useful. A major problem arises, 
however, because the method denies to the analyst know- 
ledge of the probability distribution of B and, therefore, 
any probabilistic inferences concerning the resulting 
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FIGURE 3 
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estimator. Hoerl and Kennard have suggested a general form 
of ridge regression [Ref. 3] and an iterative method of 
determining k. In addition, Hemmerle [Ref. 7] has derived 
a closed form solution based on this method. Another 
possibility is to use the ridge trace or the plot of 

/\ & f /\ & 

3 3 quantitatively to calculate a point value for k in 

such a way that the marginal probability distribution, f^*, 

3 

may be determined. Two such methods using the ridge trace 
are examined in the next section. 
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II. PROPOSED OBJECTIVE RULES FOR CHOOSING k 



The slope (rate of change) of the ridge trace curves 
or the absolute change of the ridge trace curves over a 
specified interval may be used to determine a value of the 
ridge parameter, k, objectively. These criteria are 
discussed here. 

Either of these criteria may be sensitive to the 

* * /N * 

behavior of each coefficient 3^. In general, 3j_ is not 
monotonic in k, although they all approach zero as k is 
increased without bound. It has been noted by Marquardt 
and Snee [Ref. 8] that it is not uncommon for one or more 

/s * 

3^ to increase in absolute value as k is increased. (See, 
for example, 3^ in Figure 2.) Therefore, the ridge trace 
should be examined by the analyst to detect any behavior 
of 3^ that might adversely affect the proper selection of k 
even though the ridge trace is not to be used directly to 
select a specific value of k. 

/s* 

It is clear that 3 is distributed multivariate normal 
if Y is distributed multivariate normal and a specific 
value of k is selected a priori. However, whenever the 
value of k is dependent on a data sample its value will 
not generally be the same for each data sample. Therefore, 
k is a random variable. Let K denote this (scalar) random 
variable . 
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^ * 

The marginal probability distribution of 3 may be 

derived from the joint probability distribution of K and 
A * 

3 which can be determined by 

f~* = * f«- (6) 

3 ,K 3 /K 

A * 

if the conditional distribution of 3 given K, fg*/£> and 
the marginal distribution of K, f^, are known. As stated 

y\ A 

above, when K is given, the distribution of 3 is known. 

It remains to determine the marginal of K, f^. Clearly, 
this distribution depends on how K is related to Y. The 
procedure will be to find a mapping from the range of Y 
into the range of K which gives the marginal distribution of 
K. With this distribution and the known conditional distri- 
bution of 3 given K, the joint distribution of 3 and K may 
be determined. It is convenient to consider the cumulative 
distribution function, F^. (k ) , since, if the functional 
relationship of K to Y, K = h(Y), is known then 

F K (k) = P [K < k] = P[h(Y) < k] = P[YeR k ] (7) 

where is a region in the space of Y corresponding to 
h(Y) k. Thus if R^ can be determined then, since the 
marginal distribution of Y is known, F^Ck) = P[YeR k ] can 
be determined and f^ may be determined from by 
differentiation. It remains to determine R^ corresponding 
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to a specified region in the space of K and an objective 
rule for mapping from Y to K. 

A. ABSOLUTE VALUE CRITERION 

The practical range of the ridge parameter is taken to 
be 0 < k < 1 in the literature. It seems reasonable then 
to choose the smallest value of k such that all g^(k) are 
close to their respective values at k = 1. In other words, 

|6*(k) - 8*(1)| < <$.; i = 1, 2, . . p (8) 

where is a constant selected by the analyst. The cri- 
terion expressed by (8) means that the ridge trace curves, 
g^, at k are within 6. of their value at k = 1 beyond which 

j_ 1- 

there is no interest. Here 6^ refers to the i— • scalar 
component of a p x 1 vector, 6. Suppose that at some 

j. L 

k = kg the m— component of the left hand size of (1) is 
the one whose absolute magnitude is largest. Define a 
p x 1 vector t such that r m = ±6 , as appropriate, and the 
other components of t are equal to the corresponding values 
of I (kg) - 8^(1) | . Then equation (8) can be rewritten 
in vector form 



6 (k 0 ) - 8(1) ■ x (9) 
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B. DERIVATIVE CRITERION 

Another potential criterion to use for selecting k is 
to require that the slopes of all 3^ be "flat enough" in 
the sense that 



A * 

33 - (k) 

-JE = i = 1, 2, . . p (10) 



where <5. is as previously defined. Define m such that the 

m— component of the left hand side of (10) is the one 

whose absolute magnitude is largest and define a p x 1 

vector it such that 7T m = ±6 m , as appropriate, and the other 

components of 7T are equal to the corresponding values of 

/\ * 

ii§_i 
'9k '• 

Then equation (1) can be written, in vector form 






93 (k) _ 
Tk " 



7T 



(ID 
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III. PROBLEM 



The problem is to determine the probability 
distribution of K given Y. It is proposed to determine 
this by attempting to derive and examine the functional 
relationship of Y and K. 

A. ABSOLUTE VALUE CRITERION 

The criterion expressed by equation (9) may be stated, 
by substituting from equation (4) 

(X'X + kI) -1 X'Y - (X'X + I) _1 X'Y = x (12) 

and by factoring 

[(X'X + kl)" 1 - (X'X + I)" 1 ]X’Y = t (13) 

but, as shown in Appendix C, equation (C-4) , the expression 
in brackets may be expanded to 

(X'X + kI) _1 [(X'X + I) - (X'X + kl) ] (X'X + I)" 1 (14) 

Therefore, by canceling terms and simplifying, equation (13) 
becomes 

(1 - k) (X'X + kl) 1 (X'X + I)" 1 X'Y = x (15) 
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If k f 1 and if (X'X + kl)' 1 and (X'X + I)' 1 exist, then 



X'Y = (_i_ 1 _)(x , X + kl) (X'X + I) x 



(16) 



The task then is to solve the linear equations in (16) 
for Y in order to determine R^. Unfortunately, equation (18) 
represents p linear restraints (hyperplanes) on n unknown 
variables where, in general, n > p. Furthermore, x is a 
function of Y. Thus, is not easily determined under 
this criterion. 

B. DERIVATIVE CRITERION 

The criterion given by equation (11) may be stated by 
substituting from equation (4) 



|^[(X'X + kI) _1 X'Y] = it 



(17) 



or since |-£-(X'X + kl) = I 



-(X'X + kl) " 2 X ' Y = 



(18) 



Now, if (X'X + kl) is not singular then 



X'Y = (X'X + kl) 2 



(19) 
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where the negative sign has been dropped since the 
criterion actually specifies the absolute value of the 
components of the derivative and the notation of tt accounts 
for proper signs. 

Equation (19) is similar to equation (16), as it should 
be since the criteria are similar, and the same difficulties 
are encountered in determining R^. as for the previous 
criterion. In addition, the derivative of tt will be 
difficult to determine. Therefore, the derivative 
criterion does not lead to a useful result either. 
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IV. NOTES ON THE FULL BAYESIAN RIDGE ESTIMATOR 



The full Bayesian ridge estimator (FBRE) is suggested 
by Eskew [Ref. 9] and is given as 

B* = (X'X + kI)“ 1 (X’Y + kB Q ) (20) 

where Bq is a prior estimate of B. There are two interesting 
properties of £ not noted by Eskew. 

First suppose that the prior Bq is chosen to be the 
OLS estimate B. Then 

B* = (X'X + kl) 1 [X' Y + k(X’X)" 1 X'Y] (21) 



and hence 



B* = (X'X + kl) 1 [I + k(X'X) -1 ]X'Y 



( 22 ) 



But 



[I + k(X'X) _1 ] = (X'X + kl) (X'X)" 1 (23) 

Substituting (23) into (22) 

B* = (X'X)' 1 X'Y = B (24) 
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Thus if the OLS estimator is used as a prior estimate 
for the FBRE, equation (21), then the resulting estimate is 
equal to the OLS estimate. 

Now, suppose that any prior estimate 3 q is used in 
equation (21) but the resulting estimate is then used as a 
prior in (21) to compute another estimate. If this pro- 
cedure is repeated indefinitely, in the limit the result 
will again be the OLS estimator regardless of what prior, 
Sq, was initially used. The proof of this is shown in 
Appendix B. 
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V. CONCLUSIONS AND RECOMMENDATIONS 



A. CONCLUSIONS 

The determination of a probability distribution of the 

A * 

ridge estimator, 3 , is desirable in order to facilitate 

the use of hypothesis tests and the computation of confi- 

A * 

dence bounds concerning 3 . The probability distribution 

A * 

of g depends on the objective rule used to select the 
ridge parameter, k. Neither of the two objective rules 
examined here appears to lead to a simply determined 
probability distribution. 

B. RECOMMENDATIONS 

The search for a useful probability distribution of 

A * 

3 should be pursued further. In particular, the closed 

form solution for k presented by Hemmerle [Ref. 7] may 

prove fruitful. Other possibilities include investigating 

other criteria based on the ridge trace such as minimizing 

the sum of squares, over all i=l, 2, . . . , p, of the 

difference between 3^(k) and 3^(1). Also, the same 

criteria applied to the ridge trace could be considered 

A * 

for the squared length of 3 . 
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APPENDIX A 



DERIVATION OF THE RIDGE REGRESSION ESTIMATOR 

The residual sum of squares for any estimator can be 
written 



•KB) = (Y - X3)' (Y - X3) = e'e (A-l) 

In ridge regression it is desirable to minimize the 
residual sum of squares subject to an acceptable length, 

^ A 

c, of the regression vector 0 . Expressed as a Lagrangian 
restraint problem this is 



min $’(3*) = (Y - XB*) » (Y - XB*) + k(B*'e* - c 2 ) (A-2) 



where k is the inverse of the Lagrangian multiplier. 

* * 

Taking partial derivatives of $ ' with respect to 3 
and setting them equal to zero 



94 >» 



0 



*\ ^ A r aA t ^ * t ^ * 

= -1^ [Y'Y - Y’XB - 3 X'Y + 3 X'XB + k3 3 ] (A-3) 

83 



Hence 



0 = - (Y ' X) ' - X'Y + 2X’XB* + 2kB* 



(A- 4) 
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or 



2X'Y = 2X'X6* + 2klf3* (A-5) 

Therefore , 

X ' Y = (X'X + kl)6* (A- 6) 

Now, if (X'X + kl) is non-singular (which k is selected 
to ensure) , then 



3* = (X'X + kI)’ 1 X'Y 



(A- 7) 
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APPENDIX B 



FULL BAYESIAN RIDGE ESTIMATION 



A. BACKGROUND 

Eskew [Ref. 9] points out that ridge estimation is 
equivalent to minimizing the squared differences between 
the regression estimates and a prior estimate of zero 
subject to a constraint on the sum of squares and suggests 
that a non- zero prior might be more reasonable. Following 
this line of reasoning he derives the full Bayesian ridge 
estimator (FBRE) 



^ 4 

3 = (X'X + kl) (X’Y + ke Q ) (B-l) 

S' 

where 8 q is a prior estimate of the true parameters B. 

Note that the ridge estimator is a special case of FBRE 
where the prior is taken to be zero. 

Eskew shows that the variance of the FBRE is the same 
as the variance of the ridge regression estimator (RRE) 
while the squared bias of the FBRE is less than that for 
the RRE, thereby resulting in a reduction of mean squared 
error. 

B. ITERATIVE USE OF THE FULL BAYESIAN RIDGE ESTIMATOR 
Suppose that the FBRE is calculated using any prior, 

/v * 

Bq, and then the result, 8^ , is used as a prior to 
calculate another FBRE, £ 2 . If this procedure is repeated 
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m times the result may be written 



* m 

^ = (1/k) 12 (kA) 1 X* Y + (kA) m 3 0 (B-2) 

where A = (X’X + kl) It is interesting to determine the 

/s * 

form of 3^ in the limit as m approaches infinity. Since A 
and X’X are positive definite matrices their eigenvalues 
are positive. Let ^ > 0 be an eigenvalue of A and > 0 
be an eigenvalue of X'X. Hoerl and Kennard show the rela- 
tionship between A^ and to be 

A. = 1/CP ± + k) (B-3) 

Now there exists an orthogonal p x p matrix P with P'P = I 
such that 



P'AP = diag (A^ , . . ., A ) (B-4) 

or since the eigenvalues of kA are kA^ and the eigenvalues 
of A are (A^) 



P'(kA) m P = diag(k'“X'“, k“A“, 



m. m 



m, m 



k m A m ) 
P P 



Now 



P’[ lim(kA) m ]P = lim P' (kA) m P 
m+°o m+°° 



(B-5) 



(B-6) 
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The right hand side of (B-6) is the limit of the right hand 
side of (B-5). By substituting from equation (B-3) a 
typical diagonal element is 0 < [k/ (p ± + k)] m < 1, since 
> 0 for all i = 1, 2, . . ., P. Therefore, each of the 
elements of the right hand side of (B-5) approaches zero 
as m approaches infinity. Hence 

P’ lim(kA) m P = 0 (B-7) 

m-J-a> 



This can only occur if 



lim(kA) m = 0 (B-8) 

m-*» 

Therefore, the last term of equation (B-2) is zero in the 
limit. Now define a matrix function S = S(kA) where 

oo 

s = 2 (kA) 1 (B-9) 

i=l 

DeRusso, Roy, and Close [Ref. 10] show that S(kA) converges 
if and only if S(kX^) converges for all kX^, the eigenvalues 
of kA. Clearly this will occur if and only if 



l kX maxl < 1 



(B- 10) 
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Substituting equation (B-3) 




(B- 11) 



or, after some algebra 



Since P m ^ n is an eigenvalue of a positive definite matrix, 
X'X, then p„. > 0 and both conditions of (B-13) are met. 

Therefore S (kA) does converge. To see what it converges to, 
define S' = S + I and multiply S' on the left by (I - kA) 



(I - kA)S» = (I - kA) (I + kA + (kA) 2 + . . .) 



(B-13) 



and multiplying the right hand side out 



(I - kA) S ' = [I + kA + (kA) 2 +...]- [kA + (kA) 2 + . . .] 



I 



(B- 14) 



Then 



S» = (I - kA) 



-1 



(B-15) 
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Then 



S = [I [ (kA) -1 - I ] kA] 1 - I 
= (l/k)A' 1 [(l/k)A' 1 - I]' 1 -I 

Substituting A = (X'X + kl)" 1 

S = [ (1/k) X* X + I] [(l/k)X , X]' 1 - I 
= kCX'X)" 1 

Substituting S into equation (B-2) 

lim 3* = (l/k)k(X , X)" 1 X'Y 
m-*» -m 

Therefore 

lim 3 * = CX , X)" 1 X'Y = 3 

, — m 

nn- 00 

Thus the iterative procedure, starting with any prior 3 q 

converges to the OLS estimator, 3. 



(B- 16) 



C B - 1 7 ) 



(B-18) 



(B- 19) 
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APPENDIX C 



MISCELLANEOUS MATRIX ALGEBRA AND CALCULUS 




Let A, B, and C denote m x n matrices. Denote 
inverses by A \ B \ and C _1 , respectively. 


their 


A. MATRIX ALGEBRA 




First, note that 




C (A + B)" 1 = (AC -1 + BC" 1 )" 1 


(C-l) 


s ince 




C (A + B)" 1 = [(A + B) C" 1 ] _1 


(C-2) 


= (AC -1 + BC -1 )" 1 


CC-3) 


Also 




A" 1 ± B 1 = A -1 (B ± A) B -1 = B - 1 (B ± A)A _1 


(C-4) 



since 



and 



B _ 1 (B ± A) A” 1 = (I ± B - 1 A) A 



(A -1 ± B _1 ) 



(C-6) 



B. MATRIX CALCULUS 

Let A(t) , B(t), and C(t) denote m x n matrices whose 
elements may be functions of the scalar variable t. Let 

f f 

A(t) and B(t) denote the derivatives of A(t) and B(t) , 
respectively, with respect to t. 

The following are shown to be true by DeRusso, Roy, 
and Close [Ref. 10]. 



^A(t)B(t) = A(t) B ( t) + A(t)B(t) 



(C- 7) 



and 



A* 1 (t) = ■A" 1 (t)A(t)A -1 (t) 



(C-8) 
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